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Abstract 

Air fluorescence detectors traditionally determine the dominant chemical composi- 
tion of the ultrahigh energy cosmic ray flux by comparing the averaged slant depth 
of the shower maximum, X max , as a function of energy to the slant depths expected 
for various hypothesized primaries. In this paper, we present a method to make 
a direct measurement of the expected mean number of protons and iron by com- 
paring the shapes of the expected X max distributions to the distribution for data. 
The advantages of this method includes the use of information of the full distri- 
bution and its ability to calculate a flux for various cosmic ray compositions. The 
same method can be expanded to marginalize uncertainties due to choice of spectra, 
hadronic models and atmospheric parameters. We demonstrate the technique with 
independent simulated data samples from a parent sample of protons and iron. We 
accurately predict the number of protons and iron in the parent sample and show 
that the uncertainties are meaningful. 
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1 Introduction 



Measurements of the energy spectrum, composition and arrival direction dis- 
tributions elicit clues to the origin of cosmic rays. Since a direct measurement 
and identification of the cosmic ray primaries is not possible at these ener- 
gies, we depend on indirect methods to understand the chemical composition 
of the cosmic ray flux. The methods vary with the detector type. In ground 
array-type cosmic ray detectors, composition analyses typically focus on the 
energy-dependence of the muon to charged particle ratio, which is thought 
to signify the change in primary particle composition as a function of en- 
ergy [1]. Air fluorescence experiments, on the other hand, image the shower 
development in the atmosphere and obtain information about the primary 
composition from X max , the slant depth at which the particle count in the 
cosmic ray air shower is a maximum. 

Showers induced by protons tend to reach the maximum of the shower devel- 
opment at larger slant depths than showers induced by heavier elements. Since 
shower-to-shower fluctuations are large, a determination of the primary type 
for individual showers is not possible, but averaged quantities like the so-called 
elongation rate have been successfully used to understand the composition of 
the cosmic ray flux on a statistical basis. The elongation rate is defined as the 
slope of the mean height of shower maximum measured in units of slant depth, 



Corresponding author, E-mail: connolly@nevis.columbia.edu 



< X max >, versus the logarithm of the primary cosmic ray energy observed. 
The measured elongation rate is compared to that of simulated proton and 
iron primaries to assess whether the cosmic ray composition is proton-like or 
heavier [2,3,4]. 

Due to the relative insensitivity of ultrahigh energy cosmic ray detectors to 
small shifts in composition between various low-Z or high-Z primaries, their 
observations have historically been compared to what is expected for "pro- 
tons" and "iron," with the two elements as stand-ins for all light and heavy 
elements. For < X max > (see Section 4), we expect ~ 100g/cm 2 difference be- 
tween protons and iron. However, as shown in Fig. 1, large shower-to-shower 
fluctuations and the detector resolution cause a significant overlap in their 
distributions (~ 40 — 70g/cm 2 ). 

Here, we present a method to answer the question: in a fluorescence data sam- 
ple where one has measured X max for each event, what is the best estimate 
for the expected mean 2 number of protons and iron from that experiment? 
This question is not directly answered through other methods. Once the an- 
swer to this question is obtained, the mean number can be converted to a flux 
for protons and iron. This method can be expanded to evaluate the contri- 
bution of any chemical element. Further, this method uses more information 
than averaged quantities like the elongation rate; it makes use of the shape of 
the X max distribution rather than simply its mean. Lastly, the technique al- 
lows the introduction of systematic uncertainties and prior knowledge or lack 
thereof concerning the composition spectra, hadronic generators, parameters 
concerning the extensive air shower and atmospheric uncertainties. 

In brief, the technique for extracting the cosmic ray composition essentially 
requires us to find the best normalizations for the "hypothesized" X max dis- 
tributions for protons and iron. By "best" we mean the sum of the expected 
mean events for the hypothesized distributions that best match the expected 
mean for the X max distribution for data. The chance that the combined hy- 
pothesized distributions match the parent distribution for data is proportional 
to the likelihood, l{D\p, /). The likelihood l(D\p, f) is maximized for the best 
estimate for the expected mean number of protons (p) and iron (/) where D 
denotes the data sample. The variables p and / sum to the expected mean 
number of data events, p + / = d. Note the expected means for protons and 



2 Here we will refer to the mean although we are taking a Bayesian approach. The 
number of protons, iron, etc. predicted by the likelihood can better understood as 
the best estimate of the "true" number of protons and iron given the universe's 
cosmic ray flux and the HiRes aperature rather than the mean number expected 
for an infinite number of HiRes experiments. (Here the universe's cosmic ray flux is 
assumed to be homogeneous and isotropic as there is currently no evidence to the 
contrary. However, there is nothing to prevent us binning X max as a function of sky 
coordinates.) Regardless, for brevity, we use frequentist language. 
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Fig. 1. Simulated cosmic ray events show shower-to-shower fluctuations which cause 
a large overlap between proton (black) and iron (red) events when plotted as a func- 
tion of X max and log(E/eV). The QGSJet hadronic model was used under the COR- 
SIKA extensive air shower simulator to model the shower profiles. These events were 
then passed through a simulation of the HiRes stereo detector response. 



iron add to the expected mean number of events and not to the total events 
in the given data sample. 



We develop the method using simulated stereoscopic data for the High Reso- 
lution Fly's Eye (HiRes) experiment in Utah. The method is tested with many 
independent data sets that originate from a simulated parent data set with 
Ptme proton events generated with an E~ ap spectrum and f true iron events 
generated with an E~ a f spectrum. Each bin of this parent distribution is fluc- 
tuated according to Poisson statistics to build a set of independent simulated 
data distributions with which we test the method of predicting p and /. A 



general description of this technique is given in [5] . To include typical instru- 
mental uncertainties in our study, we will apply the method to a particular 
cosmic ray experiment, the HiRes stereo experiment. All simulated showers 
used in this paper have been subjected to the full HiRes stereo simulation and 
reconstruction. 

The paper is organized as follows: Section 2 gives an overview of the HiRes 
detector, which we use as the detector in the simulated data. Section 3 gives 
a description of the likelihood method used to find the most likely mixture of 
protons and iron in the simulated data. In Section 4, the likelihood method 
is first modified to account for our ignorance of the proton and iron spectra, 
and then demonstrated with simulated data samples. Finally, Section 5 states 
conclusions and results. 



2 The HiRes Detector 



HiRes is an air fluorescence experiment with two sites (HiRes 1 & 2) at the US 
Army Dugway Proving Ground in the Utah desert (112° W longitude, 40° N 
latitude, vertical atmospheric depth 860g/cm 2 ). The two sites are separated 
by a distance of 12.6 km. 

Each of the two HiRes "eyes" comprises several telescope units monitoring 
different parts of the night sky. With 22 (42) telescopes with 256 photomulti- 
plier tubes each at the first (second) site, the full detector covers about 360° 
(336°) in azimuth and 3° — 16.5° (3° — 30°) in elevation above horizon. Each 
telescope consists of a mirror with an area of about 5 m 2 for light collection 
and a cluster of photomultiplier tubes in the focal plane. 

A cosmic ray primary interacting in the upper atmosphere induces an extensive 
air shower which the detectors observe as it develops in the lower atmosphere. 
The photomultiplier tubes triggered by the shower define an arc on the sky, 
and, together with the position of the detector, the arc determines the so-called 
shower-detector plane. When an air shower is observed in stereo, the shower 
trajectory is in principle simply the intersection of the two planes. This method 
can be further improved by also taking advantage of the timing information of 
the tubes, and in our analysis the shower geometry is determined by a global 
X 2 minimization using both the timing and pointing information of all tubes. 

The next step in the reconstruction is to calculate the shower development 
profile. However, light arriving at the detector is collected by discrete PMTs, 
each of which covers about 1° x 1° of the sky. The signal from a longitudinal 
segment of the EAS is thus necessarily split among many PMTs. For profile 
fitting, the signal must be re-combined into bins that correspond to the Ion- 



gitudinal segments of the EAS in HiRes 1 and HiRes 2. The re-binned signal, 
corrected for atmospheric extinction and with Cerenkov light subtracted is fit 
to a Gaisser-Hillas functional form (Eq. 1) using a \ 2 that fits the function to 
the profiles measured in the two detectors. This form has been shown to be in 
good agreement with EAS simulations [6,7,8] and with HiRes data [9]. 



V" V" \ {Xmax—Xoj/X 

N(X) = N max ( -?- exp 



A 



From measurements of laser tracks and stars in the field of view of the cameras 
we estimate that the systematic error in the arrival direction determination is 
not larger than 0.2°, mainly caused by uncertainties in the survey of mirror 
pointing directions. 

Various aspects of the HiRes detector and the reconstruction procedures are 
described in [10,11,12]. 



3 The Likelihood 



To calculate the best estimate of p and /, the mean number of protons and 
iron expected (at a HiRes experiment), we maximize l(D\p, /), the likelihood 
of the data sample D given p and /. To this end, Bayes' theorem allows us to 
convert the probability of a particular data sample D given the composition 
mixture, P(D\p,f), to the probability of the composition mixture given the 
data, P(p,f\D): 

Pip f\D) = ^IP»/)g(P»/) {2 ) 



In this expression, q(p, /) is the prior probability of p and /. Initially, the 
prior probability will contain information about the expected distributions for 
protons and iron and their respective energy spectra. However, in principle, it 
can contain any number of (un)knowns concerning the hadronic generator, un- 
certainties in atmospheric parameters, etc. We effectively maximize P(p, f\D) 
by maximizing l(D\p, f) = P(D\p, f)q(p, /). 

To calculate l(D\p,f), we divide the X max distributions into iV bins. We let 
Pi and Fi be the number of events in the i th bin of the simulated proton and 
iron samples, respectively, and Di be the number of data events in the i th 
bin. (The capitalized quantities are ones which we know at the outset of the 
calculation.) 




While we typically generate samples of simulated events that are much larger 
than the data sample, the sets Pi and F$ nevertheless represent single instances 
or fluctuations around the true, unknown means pi and /$ for the simulated 
samples. Similarly, the number of data events D t in the i th bin fluctuates 
around the true mean count di. Assigning a Poisson probability to the counts 
in each bin Di, Pi, and Fi, we can write the likelihood function as: 



KD\{di},m,{fi}) = n (%r- 1 ( ^^— ) ( yjll ^r- 1 ( 3 ) 



The actual values for pi and fi will not interest us; these are nuisance pa- 
rameters which we will eventually eliminate. However, the mean number of 
data counts expected in the i th bin is expressed as a weighted sum of these 
parameters: 



di = e p pi + e f fi. (4) 

The purpose of the weights e p and Ef is to 1) scale down the (presumably 
larger) simulated sample sizes to the data sample size, and 2) set the relative 
mixture of protons and iron expected in the data. Hence, the quantities which 
we want to estimate are e p and e/. Inserting Eq. (4) into Eq. (3) and marginal- 
izing (integrating out) the nuisance parameters pi and fi we define the global 
likelihood function l(D\e p ,€f) as: 



N bins 



l(D\e p ,e f ) = J[ I d Pi I dfi 

i=\ 



' ( e p pj + ejfjf^vn+ejf i) ' 
A! 

>< f^F) f ^ i ■ < 5 » 



In fact, the integration can be performed exactly (see [5] for details), and 
reduces to: 



N Di /p , jj _ \ / \Di-n 

'pi^c/)=nEf ' ; 



,=i„=ov Di-n ;(l + ^+A-«+i 



In practice, then, the function we maximize is not l(D\p, /) directly; rather, we 
maximize l(D\e p ,€f) (or, more precisely, we minimize — log l(D\e p ,€f)) with 
respect to both e p and e/. 

With our best estimates for e p and e/, we can now estimate the mean number 
of proton and iron events p and / expected for the experiment. The estimates 
suggested in [5] are 

N 

P = e p (Y / P i + N) (7) 

i=i 
and 

N 

f = ef(£ F i + N ) (8) 

i=i 

which of course reduce to the more obvious estimates p = e p J2Pi and / = 

€f J2 Fi i n the usual situation when the number of simulated events is much 

greater than the number of bins in X max . 



4 Demonstrating the Method 



To demonstrate the method, we create many simulated data samples from 
a single parent distribution made with ptrue proton and ft rue iron simulated 
events with energies in excess of 10 18 ' 5 eV. First, we generate a library of show- 
ers of various energies and primary particles with the CORSIKA simulator 
using the QGSJET hadronic model. For a given composition, these showers 
are selected randomly according to their energy spectrum and then simulated 
in the detector with a random geometry. We expect the energy spectra to 
have an energy dependence between E~ 2 and E~ 3 . In our first application of 
the method, we will pick a p = 2 and «/ = 3, a choice motivated by previous 
composition analyses [3]. This will just serve as an example; we will drop this 
assumption in the next section, where we show how to eliminate the spec- 
tral dependence. The simulated data samples are subject to the same quality 
cuts that we would apply to real data (see e.g. [13]). We require a minimum 
track length of 3° in each detector, an estimated angular uncertainty in both 
azimuth and zenith angle of less than 2°, and a zenith angle less than 70°. 
We additionally require an estimated energy uncertainty of less than 20% and 
% 2 /dof < 5 for both the energy and the geometry fit, and that X max appear 
in the field of view one of the detectors. Lastly, the reconstructed primary 
particle energy must be greater than 10 19 eV. 

After applying the quality cuts to the parent sample, we fluctuate each bin 
in the X max distribution in the parent data sample many times according to 
Poisson statistics to obtain many independent data samples. For the moment, 
we assume that we know that a p = 2 and a.j — 3. If the method works, we 
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Fig. 2. Iron and proton X m , 
with energies above 10 19 eV. 



distributions generated with E 2 and E 3 spectra 



expect the means for p max — Ptrue and f max — f true to be 0, and the calculated 
uncertainties for p and / should be meaningful; that is, the quoted values for 
p and / are within 1 a of pt rue and f trU e in 68% of the data samples. Figure 2 
shows the proton and iron distributions used to calculate the likelihood. We use 
the same events for generating the parent distribution and for the hypothesized 
distributions in the likelihood. We generate 2571, 773, 665 and 1121 events for 
E~ 2 protons, E~ 3 protons, E~ 2 iron and E~ 3 iron. A similar but independent 
set of distributions were combined to build the 'fake' data sample. 



We first test the method with a mixture of 50% protons and 50% iron. Figure 3 
shows — log I (D\p, fmax) and — log I (D\p max , f), for one simulated data set 
with 86 protons and 117 iron events fluctuated from a parent sample of 100 
proton and 100 iron events; p ma x (fmax) is where l(D\p, f) is maximized with 
repect to p (/). From the parabolic shape of these distributions one might 
suspect that the 1 a uncertainties for p and / can be calculated through the 
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Fig. 3. - log I (D\p,f max ) (top) and -log I (D\p max , f) (bottom) evaluated for one 
simulated data sample with 86 proton and 117 iron events fluctuated from 100 proton 
and 100 iron events. Although regions about the minima are parabolic, the distribu- 
tions have substantial tails. 



equations 



and 



log/(jWr ±0- p ,fr, 



log l(p r , 



+ 



- log l(p max , fmax ± CTf) = ~ log l(f„ 



(9) 



(10) 



where a p and o~f are the uncertainties for p and /, respectively. However, 
we find that l(D\p max , f) and l(D\p, f max ) have substantial tails and so their 
uncertainties have to be calculated numerically. 

Figure 4 shows the corresponding distributions for protons and iron normalized 
to the number of predicted events, the total predicted mean and the simulated 
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Fig. 4. The distributions for protons and iron normalized to the number of predicted 
events, the total predicted mean, and a simulated data set of 86 proton and 111 iron 
events created from a parent distribution of 100 proton and 100 iron events. The \ 2 
between the parent data set and the predicted sum is 6. 1 with 8 degrees of freedom. 



data set. We calculate the x 2 f° r this fit to be 6.7 with 8 degrees of freedom 
while keeping in mind that the total events predicted by the sum of p and / 
are not meant to predict the total number of events, but rather the expected 
mean number of events. 



To artificially separate the systematic and statistical uncertainties, one can, for 
example, remove any prior knowledge concerning the uncertainties in our mod- 
els for protons and iron (i.e. remove q({Pi}, {/»}, £pP, £//) from 



11 



l(D\{pi}, {/}, e p p, eff)) and define statistical uncertainty as 

N r r d Di e~^ 
J dpi J df 



N f ,. jDi -di 

istat(D\ P j) = n/ dpi] dff^j- (n; 



Then, by calculating the total uncertainty (a to t) from l(D\p, f) and cr stat from 
l s t a t(D\p, f), the systematic uncertainty (cT sys ) can be deduced from 



OSys = \J°tot - CF stat - (12) 

Notice we have assumed that the l sta t{D\p, f) and l(D\p, /) both have maxima 
at the same p max and f max - This is not the case in general. However, in the 
instance where the maxima do not occur at the same p ma x and fmax, the 
difference arises from choosing what is in fact an arbitrary definition of the 
statistical uncertainty, l sta t(D\p, f). 

Figure 5 shows the distributions of p max —ptrue and f max — ftrue histogrammed 
in blue. The means of these distributions do not deviate more than 1.3% 
for protons and iron - an insignificant deviation considering the mean un- 
certainties in Fig. 6. Figure 6 shows the distribution of \p max — ptrue\/o~ p anc ^ 
\fmax — /true I /c/ where o~ v and oj are the uncertainties for p and /, respec- 
tively. As expected, 68% of the predictions for p and / deviate less than la 
from the Pt rue and ftrue- Figures 7 and 8 shows the relative error for protons 
and iron, respectively. The most probable error for p and / is ~ 13%. 

We have tested the method with other mixtures of iron and protons and find 
that it gives reasonable rsults for all two-component mixtures. For instance, in 
Fig. 9 we find that the method still gives reasonable uncertainties in the case 
where there is an 80:20 protomiron mixture in a 200 event parent distribution. 
The mode of the fractional error is ~ 8% (~ 28%) for protons (iron). 



4-1 Eliminating Spectral Dependence in the Likelihood 



In practice, the energy spectra for protons and iron are unknown. To include 
our lack of prior knowledge of the energy spectrum, a slightly more sophisti- 
cated likelihood is used. 

The method allows us to float a p and a/. The likelihood can be maximized as 
a function of a p and cc/ along with p and /, but, due to limited statistics in the 
hypothesized simulated samples, we choose to marginalize these parameters. 
In principle, one would integrate over an infinite number of spectra. However, 
the CPU time for such a calculation is unrealistic, so here these spectra will 
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Fig. 5. Histograms of p-ptrue (top) and f-ftrue (bottom) for ptrue = 100 and 
ftrue = 100. The green (blue) histograms were calculated with Eq. (15) (Eq. (5)). 
The black histogram makes use of Eq. (15) that is applied to a parent data sample 
made of 250 events with 100 proton, 100 iron and 50 gammas. The red histogram 
includes an additional term in Eq. (15) to account for gammas. The histograms 
have overflow events (e.g., most of the events in the black histogram lie outside the 
abscissa limits). 



be marginalized using E 2 and E 3 spectra. Eq. (2) then becomes 



P(pJ\D) 



Ea p =-2,-3 Ea f =-2,-3 P(D\P, f, ® P i "/MP, /, "p, «/) 



E 



p,f,a p ,a f 



P{D\P, /, Op, Oi f )q(p, f, op, a f ) 



(13) 



Other priors such as those concerning the interaction model can be introduced 
and marginalized in a similar manner. Therefore, maximizing the probability 
P(p,f\D) is to maximize the likelihood 
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Fig. 6. Deviation from ptrue and ftrue expressed relative to the proton and iron 
uncertainties, a p and af, respectively. The green (blue) histograms were calculated 
with Eq. (15) (Eq. (5)). The black histogram makes use of Eq. (15) that is applied to 
a parent data sample made of 250 events with 100 proton, 100 iron and 50 gammas. 
The red histogram includes an additional term in Eq. (15) to account for gammas. 
The histograms have overflow events (e.g., many of the events in the black histogram 
lie outside the abscissa limits). 
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= l(D\{p l },{f l },e p ,e f ). 



(14) 



or 



l(D\e p ,e f )= J2 Y, II /* J d ^\ /): 



ii„ = -2,-3qi = -2,-3 i=l 
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Fig. 7. Fractional uncertainty for the mean number of protons (p). The green (blue) 
histograms were calculated with Eq. (15) (Eq. (5)). The black histogram makes use 
of Eq. (15) that is applied to a parent data sample made of 250 events with 100 
proton, 100 iron and 50 gammas. The red histogram includes an additional term in 
Eq. (15) to account for gammas. The histograms have overflow events. 



We can see that the introduction of priors does not change the distributions 
in Figs. 5 and 6 substantially as seen by comparing the blue and green his- 
tograms. Its effect can be best seen by the upward shift in the mean relative 
uncertainty (between the blue and green histograms) in Fig. 7 and 8. 

We conclude that the method gives an accurate prediction of the mean number 
of protons and iron. For instance, with 100 proton and 100 iron events in a 
data sample with primary energies over 10 19 eV, the uncertainty in p and / 
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Fig. 8. Fractional uncertainty for the mean number of iron (f). The green (blue) 
histograms were calculated with Eq. (15) (Eq. (5)). The black histogram makes use 
of Eq. (15) that is applied to a parent data sample made of 250 events with 100 
proton, 100 iron and 50 gammas. The red histogram includes an additional term in 
Eq. (15) to account for gammas. The histograms have overflow events. 



is ~ 12% with a better chance of obtaining a larger uncertainty. This increase 
is expected as we have introduced some "ignorance" in the choice of spectra. 
Further, the likelihood is able to accurately predict the 1 a uncertainties. 

In the instance where we wish to sum over spectra, we would ideally inte- 
grate over every possible spectrum for protons and iron. Computationally, 
this is obviously impossible. Instead, we compromise by summing over "ex- 
tremes" of what we would expect for protons and iron. In this first approxi- 



mation, we calculate L(D\e p , e/, a p 



2, a f = 3), 
L(D\e p , €f, a p = 3, a/ = 2) and L(D\e p , e/, a p = 3, a/ = 3). That is, we cal- 
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Fig. 9. Fractional uncertainty for p (black) and f (red) calculated for simulated 
data sets with a parent sample comprised of 160 proton and 40 iron events. 



culate the likelihood with every combination of spectra for the proton and 
iron distributions. Then, we sum these 4 likelihood and maximize the sum 
with respect to e p and €f. Had we summed over generators (say QGSJET and 
SIBYLL), we would sum over 8 likelihoods where each likelihood is calculated 
with a different combination of spectra and hadronic generators. The calcula- 
tion of uncertainties are made in the usual way with the summed likelihood. 



5 Discussion 



We now investigate the case where the sample made of proton and iron is 
"contaminated" with other components. We start the discussion with gamma 
primaries. AGASA has estimated the upper limit in the 7-ray flux to be 28% 
for events above 10 19 eV [16]. We therefore create a 250 events sample with 100 
protons, 100 iron, 50 gammas. The hypothesized distributions for gammas are 
based on 1352 and 490 gammas with E~ 2 and E~ 3 , respectively. We use the 
gammas with an E~ 3 spectrum to generate the parent distribution. The result 
of the contamination is a large bias in p and / found in the black histogram 
in Fig. 5 and wrong uncertainties as seen in Figs. 6, 7 and 8. However, if the 
likelihood is modified to 
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where e g , Gi, etc. carry the same definitions for gammas as e p p, Pi, etc. carry 
for protons, we see that the uncertainties once again make sense and the bias 
becomes negligible as shown by the red histograms is Figs. 5 and 6. In Figs. 7 
and 8 we see an increase in the relative uncertainties for iron while the proton 
uncertainties do not change significantly. An increase in the uncertainies would 
not be surprising as there is an overlap in the X max distributions between 
protons, iron and gammas. The lack of significant change is a result of the 
right combination of protons with an E~ 2 spectrum and iron with an E~ 2 
spectrum having a much larger likelihood value than the alternatives in the 
summation of a p and a/. This demonstration shows that one must account 
for gamma-like primaries if there is a reasonable expectation that they may 
be in the data. 

Accounting for all possible contributions from every imaginable primary is 
clearly impossible. A number of questions then arise. For instance, is it sufncent 
to assume a proton and iron mixture when analyzing a real data distribution? 
Is it necessary to assume, for instance, a proton, helium, carbon and iron 
mixture? Would such complex mixtures be more or less useful than simply 
quoting the result assuming a proton and iron mixture? 

These questions can be approached in two ways. If we had good reason to 
believe ultrahigh energy cosmic rays were composed exclusively of (say) pro- 
tons, carbon and iron, then we would consider it useful to measure the relative 
contributions from these three elements regardless of their uncertainties. 

On the other hand, if we consider protons, carbon and iron as stand-ins for 
"light," "medium" sized and "heavy" primaries, then we can ask how sensitive 
the likelihood method is to these elements. That is, if there were something 
resembling carbon mixed with protons and iron, could it be detected to any 
accuracy? 

In the latter vein, we build a "fake" data sample with 100 proton, 50 carbon 
and 100 iron primaries. We then apply the full likelihood in Eq. (16), replacing 
the gamma term with carbon, and evaluate the distribution of uncertainties 
(Fig. 10). Although we find the uncertainties are meaningful, the large un- 
certainties for carbon indicate that the resolution is not sufficient to make 
statements about the fraction of medium-sized primaries in this type of data 
sample by inserting the carbon hypothesis. From this perspective, it is less 
important to account for carbon primaries than gamma primaries. 

We also study the case where two hypothesized elements are indistinguish- 
able. We create a fake data set of 250 events with 150 proton and 100 iron 
events. A proton distribution, a second identical proton distribution and an 
iron distribution act as hypotheses. That is, the likelihood is calculated using 
Eq. (16) substituting a second proton distribution for the gamma distribution. 




Relative Error 
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Fig. 10. Distribution of the relative errors for proton, carbon and iron hypotheses 
evaluated using Eq. (16) over simulated data sets fluctuated from a parent sample of 
100 proton, 50 carbon and 100 iron events. 



The algorithm that maximizes the likelihood, unable to distinguish how many 
of the 150 proton are attributable to the first proton distribution as opposed 
to the second, settles on an arbitrary number of proton events between and 
~ 150 for the first proton hypothesis (pi). What remains of ~ 150 events is 
attributed to the second proton hypothesis {p%). The uncertainties are then 
calculated for p\ and p 2 by fixing the number of events for p\ (p 2 ) fluctuating 
Pi (pi) ± la according to Eq. (9). As there are large correlations between p\ 
and P2 that are not considered when calculating the uncertainties in this way, 
the method underestimates the uncertainties (1 a corresponds to 15% confi- 
dence region). Therefore, before an element is inserted as a hypothesis, the 
correlations between the various distributions need to be understood or one 
needs to verify that the distributions are sufficiently uncorrelated such that 
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the results have meaningful uncertainties. 

Also, one may consider other hadronic models. In principle, the hadronic mod- 
els can be marginalized by simply "summing" over them. X max distributions 
for protons, iron and gammas with E~ 2 and E~ 3 spectra would be generated 
using QGSJET and SIBYLL hadronic models. The complexity of the likeli- 
hood and the amount of Monte Carlo increases sharply with the number of 
priors. For the sake of time and simplicity the full calculation is not made here. 
However, to show that it is important to consider different hadronic models, 
we generate a simulated data set with 100 proton and 100 iron events with the 
SIBYLL hadronic model. We then fluctate this distribution many times in the 
same fashion described above each time calculating the maximum likelihood 
using proton and iron events generated with QGSJET. Figure 11 shows the 
~ 60% shift in the mean number of protons and iron resulting from calculating 
the likelihood using QGSJET. Regardless, this shift shows that the composi- 
tion fractions cannot be properly calculated without considering at least the 
QGSJET and SIBYLL hadronic models. 

We have discussed widening the measurement to include such knowledge and 
given an example of how one might introduce prior knowledge by introducing 
and then marginalizing the unknown spectral dependence of iron and protons. 
Other parameters concerning the extensive air shower and atmospheric uncer- 
tainties can be introduced and marginalized provided sufficient data and CPU 
time. In so doing, uncertainties are rigorously compounded in the calculation. 

Further, we have investigated the consequence of contaminating the simulated 
data with gammas and carbon. Proton, iron and gamma hypotheses are suf- 
ficient if one is only interested in stand-ins for "heavy," "light," and "lighter" 
elements. However, it is important to consider all elements that one reasonably 
expects to be in the data. 

This method of measuring the composition is different from the method us- 
ing elongation rate. It uses the full distribution of X max to determine the 
composition as opposed to the elongation rate which uses the mean. Also, the 
measured elongation is independent of any model assumptions although its in- 
terpretation requires one to compare the measurement to what is expected for 
various compositions. The method is currently being applied to HiRes stereo 
events and results will be presented in a separate paper where the composition 
will be evaluated for events in intervals of 10 18 ' 5 eV, 10 190 eV, 10 19 ' 5 eV, etc. 
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